Jet substructure as a new Higgs search channel at the LHC 
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It is widely considered that, for Higgs boson searches at the Large Hadron Collider, WH and ZH 
production where the Higgs boson decays to bb are poor search channels due to large backgrounds. 
We show that at high transverse momenta, employing state-of-the-art jet reconstruction and decom- 
position techniques, these processes can be recovered as promising search channels for the standard 
model Higgs boson around 120 GeV in mass. 



A key aim of the Large Hadron Collider (LHC) at 
CERN is to discover the Higgs boson, the particle at the 
heart of the standard-model (SM) electroweak symmetry 
breaking mechanism. Current electroweak fits, together 
with the LEP exclusion limit, favour a light Higgs boson, 
i.e. one around 120 GeV in mass This mass region 
is particularly challenging for the LHC experiments, and 
any SM Higgs-boson discovery is expected to rely on a 
combination of several search channels, including gluon 
fusion — ► H — ► 77, vector boson fusion, and associated 
production with tt pairs 0, Q . 

Two significant channels that have generally been con- 
sidered less promising are those of Higgs-boson produc- 
tion in association with a vector boson, pp — > WH, ZH, 
followed by the dominant light Higgs boson decay, to two 
6-tagged jets. If there were a way to recover the WH and 
ZH channels it could have a significant impact on Higgs 
boson searches at the LHC. Furthermore these two chan- 
nels also provide unique information on the couplings of 
a light Higgs boson separately to W and Z bosons. 

Reconstructing W or Z associated H — ► bb production 
would typically involve identifying a leptonically decay- 
ing vector boson, plus two jets tagged as containing b- 
mesons. Two major difficulties arise in a normal search 
scenario. The first is related to detector acceptance: lep- 
tons and 6-jets can be effectively tagged only if they are 
reasonably central and of sufficiently high transverse mo- 
mentum. The relatively low mass of the VH (i.e. WH or 
ZH) system means that in practice it can be produced 
at rapidities somewhat beyond the acceptance, and it is 
also not unusual for one or more of the decay products 
to have too small a transverse momentum. The second 
issue is the presence of large backgrounds with intrin- 
sic scales close to a light Higgs mass. For example, tt 
events can produce a leptonically decaying W, and in 
each top-quark rest frame, the 6-quark has an energy of 
~ 65 GeV, a value uncomfortably close to the mjj/2 that 
comes from a decaying light Higgs boson. If the second 
W-boson decays along the beam direction, then such a 
tt event can be hard to distinguish from a WH signal 
event. 

In this letter we investigate VH production in a 
boosted regime, in which both bosons have large trans- 
verse momenta and are back-to-back. This region cor- 



responds to only a small fraction of the total VH cross 
section (about 5% for pt > 200 GeV), but it has several 
compensating advantages: (i) in terms of acceptance, the 
larger mass of the VH system causes it to be central, and 
the transversely boosted kinematics of the V and H en- 
sures that their decay products will have sufficiently large 
transverse momenta to be tagged; (ii) in terms of back- 
grounds, it is impossible for example for an event with 
on-shell top-quarks to produce a high-pr bb system and 
a compensating leptonically decaying W, without there 
also being significant additional jet activity; (iii) the HZ 
with Z — > vv channel becomes visible because of the large 
missing transverse energy. 

One of the keys to successfully exploiting the boosted 
VH channels will lie in the use of jet-finding geared to 
identifying the characteristic structure of a fast-moving 
Higgs boson that decays to b and b in a common neigh- 
bourhood in angle. We will therefore start by describing 
the method we adopt for this, which builds on previous 
work on heavy Higgs decays to boosted W's j4j], WW 
scattering at high energies @ and the analysis of SUSY 
decay chains @. We shall then proceed to discuss event 
generation, our precise cuts and finally show our results. 

When a fast-moving Higgs boson decays, it produces 
a single fat jet containing two b quarks. A successful 
identification strategy should flexibly adapt to the fact 
that the bb angular separation will vary significantly with 
the Higgs pt and decay orientation, roughly 

R bb-—j===—, (pt>to h ), (1) 

y/z(l ~ Z) PT 

where z, 1 — z are the momentum fractions of the two 
quarks. In particular one should capture the b, b and any 
gluons they emit, while discarding as much contamina- 
tion as possible from the underlying event (UE), in order 
to maximise resolution on the jet mass. One should also 
correlate the momentum structure with the directions of 
the two 6-quarks, and provide a way of placing effective 
cuts on the z fractions, both of these aspects serving to 
eliminate backgrounds. 

To flexibly resolve different angular scales we use the 
inclusive, longitudinally invariant Cambridge/ Aachen 
(C/A) algorithm @, i] : one calculates the angular dis- 
tance ARfj = (yi — yj) 2 + — cj)j) 2 between all pairs of 
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FIG. 1: The three stages of our jet analysis: starting from a hard massive jet on angular scale R, one identifies the Higgs 
neighbourhood within it by undoing the clustering (effectively shrinking the jet radius) until the jet splits into two subjets 
each with a significantly lower mass; within this region one then further reduces the radius to Ran and takes the three hardest 
subjets, so as to filter away UE contamination while retaining hard perturbative radiation from the Higgs decay products. 



objects (particles) i and j, recombines the closest pair, 
updates the set of distances and repeats the procedure 
until all objects are separated by a AR43 > R, where R 
is a parameter of the algorithm. It provides a hierarchical 
structure for the clustering, like the K± algorithm [9;, 10*], 
but in angles rather than in relative transverse momenta 
(both are implemented in FastJet 2.3[ll|). 

Given a hard jet j, obtained with some radius R, we 
then use the following new iterative decomposition proce- 
dure to search for a generic boosted heavy-particle decay. 
It involves two dimensionless parameters, [i and y C ui> 

1. Break the jet j into two subjets by undoing its last 
stage of clustering. Label the two subjets j\ , j'2 such 
that rrij 1 > rrij 2 . 

2. If there was a significant mass drop (MD), rrij 1 < 
fXTUj, and the splitting is not too asymmetric, y = 

mm( P y J2 ) Aj ^ > ^ then deem . tQ be 

heavy-particle neighbourhood and exit the loop. 
Note that y ~ mm(p tjl ,p th )/ m3X.{p th ,p tj2 )} 

3. Otherwise redefine j to be equal to ji and go back 
to step 1. 

The final jet j is to be considered as the candidate Higgs 
boson if both ji and j% have b tags. One can then identify 
R bb with ARj ± j 2 . The effective size of jet j will thus be 
just sufficient to contain the QCD radiation from the 
Higgs decay, which, because of angular ordering pjl, [l3|, 
[HI, will almost entirely be emitted in the two angular 
cones of size R bb around the b quarks. 

The two parameters \x and y cu t may be chosen inde- 
pendently of the Higgs mass and px- Taking p. > 1/V3 
ensures that if, in its rest frame, the Higgs decays to a 
Mercedes bbg configuration, then it will still trigger the 
mass drop condition (we actually take /i = 0.67). The cut 
on y ~ xmn{zj 1 ,Zj 2 )/xnax(zj 1 ,Zj 2 ) eliminates the asym- 
metric configurations that most commonly generate sig- 
nificant jet masses in non-6 or single-6 jets, due to the 



1 Note also that this y cu t is related to, but not the same as, that 
used to calculate the splitting scale in 0-0, which takes the jet 
Pt as the reference scale rather than the jet mass. 



Jet definition o\s/fb crg/fb S/VB- fb 

C/A, 7? = 1.2, MD-F 0.57 0.51 0.80 
Kx, R= 1-0, ycut 0.19 0.74 0.22 
SISCone, R = 0.8 | 0.49 1 1.33 1 0.42 

TABLE I: Cross section for signal and the Z+jets background 
in the leptonic Z channel for 200 < prz/GeV < 600 and 
110 < mj/GeV < 125, with perfect b-tagging; shown for 
our jet definition, and other standard ones at near optimal R 
values. 

soft gluon divergence. It can be shown that the maxi- 
mum S/ \f~B for a Higgs boson compared to mistagged 
light jets is to be obtained with y cut ~ 0.15. Since we 
have mixed tagged and mistagged backgrounds, we use a 
slightly smaller value, y cut — 0.09. 

In practice the above procedure is not yet optimal 
for LHC at the transverse momenta of interest, px ~ 
200 — 300 GeV because, from eq. (P), R bb > 2m B /pT is 
still quite large and the resulting Higgs mass peak is sub- 
ject to significant degradation from the underlying event 
(UE), which scales as R^t [Hj]. A second novel element 
of our analysis is to filter the Higgs neighbourhood. This 
involves resolving it on a finer angular scale, Ran < R bb , 
and taking the three hardest objects (subjets) that ap- 
pear — thus one captures the dominant O (a s ) radiation 
from the Higgs decay, while eliminating much of the UE 
contamination. We find R&\t = min(0.3, R bb /2) to be 
rather effective. We also require the two hardest of the 
subjets to have the b tags. 

The overall procedure is sketched in Fig. [TJ We il- 
lustrate its effectiveness by showing in table [T] (a) the 
cross section for identified Higgs decays in HZ produc- 
tion, with to h = 115 GeV and a reconstructed mass re- 
quired to be in an moderately narrow (but experimen- 
tally realistic) mass window, and (b) the cross section 
for background Zbb events in the same mass window. 
Our results (C/A MD-F) are compared to those for the 
K± algorithm with the same y cu t and the SISCone fill ] 
algorithm based just on the jet mass. The K± algorithm 
does well on background rejection, but suffers in mass 
resolution, leading to a low signal; SISCone takes in less 
UE so gives good resolution on the signal, however, be- 
cause it ignores the underlying substructure, fares poorly 
on background rejection. C/A MD-F performs well both 
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on mass resolution and background rejection. 

The above results were obtained with HER- 
WIG 6.510[l3, [HI with Jimmy 4.31 [19] for the under- 
yling event, which has been used throughout the sub- 
sequent analysis. The signal reconstruction was also 
cross-checked using Pythia 6.403 (2^| . In both cases 
the underlying event model was chosen in line with the 
tunes currently used by ATLAS and CMS (see for ex- 
ample 111] 2 ). The leading-logarithmic parton shower 
approximation used in these programs have been shown 
to model jet substructure well in a wide variety of pro- 
cesses dl [H [H [H H3, [H. For this analysis, sig- 
nal samples of WH, ZH were generated, as well as 
WW, ZW, ZZ, Z + jet, W + jet, tt, single top and dijets 
to study backgrounds. All samples correspond to a lu- 
minosity > 30 fb _1 , except for the lowest p™ m dijet sam- 
ple, where the cross section makes this impractical. In 
this case an assumption was made that the selection ef- 
ficiency of a leptonically-decaying boson factorises from 
the hadronic Higgs selection. This assumption was tested 
and is a good approximation in the signal region of the 
mass plot, though correlations are significant at lower 
masses. 

The leading order (LO) estimates of the cross-section 
were checked by comparing to next-to-leading order 
(NLO) results. High-py VH and Vbb cross sections were 
obtained with MCFM [U,|30j] and found to be about 1.5 
times the LO values for the two signal and the Z°bb chan- 
nels (confirmedwith MC@NLO v3.3 for the signal [HI), 
while the W^bb channel has a K-factor closer to 2.5 (as 
observed also at low-px in [13]). 3 The main other back- 
ground, tt production, has a K-factor of about 2 (found 
comparing the HERWIG total cross section to [32]). This 
suggests that our final LO-based signal / -^/background es- 
timates ought not to be too strongly affected by higher 
order corrections, though further detailed NLO studies 
would be of value. 

Let us now turn to the details of the event selection. 
The candidate Higgs jet should have a pr greater than 
some p™ m . The jet i?-parameter values commonly used 
by the experiments are typically in the range 0.4 - 0.7. 
Increasing the i?-parameter increases the fraction of con- 
tained Higgs decays. Scanning the region 0.6 < R < 1.6 
for various values of ?5™ m indicates an optimum value 
around R=1.2 with p^ iD = 200 GeV. 

Three subselections are used for vector bosons: (a) An 
e + e _ or fi + fi~ pair with an invariant mass 80 GeV < 
m < 100 GeV and pr > p^ m . (b) Missing transverse 
momentum > p™ n . (c) Missing transverse momentum 



2 The non-default parameter setting are: PRSOF=0, 
JMRAD(73]=1.8, PTJIM=4.9 GeV, JMUEO=l, with 
CTEQ6L [p PDFs. 

3 For the Vbb backgrounds these results hold as long as both the 
vector boson and bb jet have a high pt\ relaxing the requirement 
on Ptv leads to enhanced X-factors from electroweak double- 
logarithms. 




Mass (GeV) Mass (GeV) 




Mass (GeV) Mass (GeV) 



FIG. 2: Signal and background for a 115 GeV SM Higgs 
simulated using HERWIG, C/A MD-F with R = 1.2 and 
pr > 200 GeV, for 30 fb _1 . The b tag efficiency is assumed 
to be 60% and a mistag probability of 2% is used. The qq 
sample includes dijets and tt. The vector boson selections 
for (a), (b) and (c) are described in the text, and (d) shows 
the sum of all three channels. The errors reflect the statisti- 
cal uncertainty on the simulated samples, and correspond to 
integrated luminosities > 30 fb _1 . 

> 30 GeV plus a lepton (e or fi) with pt > 30 GeV, 
consistent with a IT" of nominal mass with px > p™ m . It 
may also be possible, by using similar techniques to re- 
construct hadronically decaying bosons, to recover signal 
from these events. This is a topic left for future study. 

To reject backgrounds we require that there be no lep- 
tons with \r}\ < 2.5, pt > 30 GeV apart from those used 
to reconstruct the leptonic vector boson, and no 6-tagged 
jets in the range < 2.5, pr > 50 GeV apart from the 
Higgs candidate. For channel (c), where the tt back- 
ground is particularly severe, we require that there are 
no additional jets with < 3,pt > 30 GeV. The re- 
jection might be improved if this cut were replaced by a 
specific top veto @. However, without applying the sub- 
jet mass reconstruction to all jets, the mass resolution 
for R = 1.2 is inadequate. 

The results for R = 1.2,p™ in = 200 GeV are shown 
in Fig.H for m H = 115 GeV. The Z peak from ZZ and 
WZ events is clearly visible in the background, providing 
a critical calibration tool. Relaxing the 6-tagging selec- 
tion would provide greater statistics for this calibration, 
and would also make the W peak visible. The major 
backgrounds are from W or Z+jets, and (except for the 
HZ(Z -> 1+1-) case), tt. 

Combining the three sub-channels in Fig. and sum- 
ming signal and background over the two bins in the 
range 112-128 GeV, the Higgs is seen with a significance 
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b Mistag Probability Higgs Mass (GeV) 

FIG. 3: Estimated sensitivity for 30 fb _1 under various dif- 
ferent sets of cuts and assumptions (a) for mn = 115 GeV 
as a function of the mistag probability for 6-subjets and (b) 
as a function of Higgs mass for the b-tag efficiency (mistag 
rates) shown in the legend. Significance is estimated as 
signal/Vbackground in the peak region. 

of 4.5 a (8.2 a for 100 ffcr 1 ). The intrinsic resolution of 
the jet mass at the particle level would allow finer bin- 
ning and greater significance. However, studies (33l , |34| 
using parameterised simulations of the ATLAS detector 
indicate that detector resolution would prohibit this. 

The 6-tagging and mistag probabilities are critical pa- 
rameters for this analysis, and no detailed study has 
been published of tagging two high-pr b subjets. Values 
used by experiments for single-tag probabilities range up 
to 70% for the efficiency and down to 1% for mistags. 
Results for 70% and 60% efficiency are summarised in 
Fig. [3^, as a function of the mistag probability. 



There is a trade-off between rising cross-section and 
falling fraction of contained decays (as well as rising back- 
grounds) as p™ m is reduced. As an example of the de- 
pendence on this trade-off, we show the sensitivity for 
p™ n = 300 GeV, R = 0.7 in FigHi. 

The significance falls for higher Higgs masses, as shown 
in Fig. [3J3, but values of 3er or above seem achievable up 
tomj = 130 GeV. 

In addition to the 6-tagging, the effects of pile-up, in- 
trinsic resolution and granularity of the detector will all 
have an impact. Several ideas exist to improve some 
of these, and initial studies with realistic detector sim- 
ulations indicate that the efficiencies and resolutions as- 
sumed here are not unreasonable, though the exact re- 
quirements of our analysis have not been studied with 
such tools. 

We conclude that subjet techniques have the potential 
to transform the high-p T WH, ZH(H — > bb) channel into 
one of the best channels for discovery of a low mass Stan- 
dard Model Higgs at the LHC. This channel could also 
provide unique information on the coupling of the Higgs 
boson separately to W and Z bosons. Realising this po- 
tential is a challenge that merits further experimental 
study and complementary theoretical investigations. 

This work was supported by the UK STFC, and by 
grant ANR-05-JCJC-0046-01 from the French Agence 
Nationale de la Recherche. GPS thanks Matteo Cacciari 
for collaboration in extending FastJet to provide the fea- 
tures used in this analysis. 



M. W. Grunewald, arXiv:0709.3744 [hep-ph], in Proceed- 
ings of Europhysics Conference on High Energy Physics [16] 
(EPS-HEP2007), Manchester, England, 19-25 Jul 2007. 
ATLAS, Detector physics performance technical design [17 
report, CERN/LHCC/99- 14/15, 1999. [18 
CMS, A. Ball et ai, J. Phys. G34, 995 (2007). 
M. H. Seymour, Z. Phys. C62, 127 (1994). [19 
J. M. Butterworth, B. E. Cox and J. R. Forshaw, Phys. 
Rev. D 65 (2002), [hep-ph/0201098]. [20 
J. M. Butterworth, J. R. Ellis and A. R. Raklev, JHEP 
05, 033 (2007), [hep-ph/0702150]. [21 
Y. L. Dokshitzer, G. D. Leder, S. Moretti and B. R. Web- [22 
ber, JHEP 08, 001 (1997), [hep-ph/9707323]. 
M. Wobisch and T. Wengler, hep-ph/9907280. [23 
S. Catani, Y. L. Dokshitzer, M. H. Seymour and B. R. 
Webber, Nucl. Phys. B406, 187 (1993). [24 
S. D. Ellis and D. E. Soper, Phys. Rev. D48, 3160 (1993), 
[hep-ph/9305266]. [25 
M. Cacciari and G. P. Salam, Phys. Lett. B641, 57 
(2006), [hep-ph/0512210]. [26 

A. H. Mueller, Phys. Lett. B104, 161 (1981). 

B. I. Ermolaev and V. S. Fadin, JETP Lett. 33, 269 [27 
(1981). 

A. Bassetto, M. Ciafaloni and G. Marchesini, Phys. Rept. [28 
100, 201 (1983). 

M. Dasgupta, L. Magnea and G. P. Salam, [29 



arXiv:0712.3014 [hep-ph]. 

G. P. Salam and G. Soyez, JHEP 05, 086 (2007), 

[arXiv:0704.0292 [hep-ph]]. 

G. Corcella et ai, hep-ph/0210213. 

G. Corcella et al, JHEP 01, 010 (2001), [hep- 
ph/0011363]. 

J. M. Butterworth, J. R. Forshaw and M. H. Seymour, 

Z. Phys. C72, 637 (1996), [hep-ph/9601371] . 

T. Sjostrand, S. Mrenna and P. Skands, JHEP 05, 026 

(2006), [hep-ph/0603175]. 

S. Alekhin et al, hep-ph/0601012. 

J. Pumplin et al, JHEP 07, 012 (2002), [hep- 
ph/0201195]. 

DO, V. M. Abazov et al, Phys. Rev. D65, 052008 (2002) 
[hep-ex/0108054]. 

CDF, D. Acosta et ai, Phys. Rev. D71, 112002 (2005) 
[hep-ex/0505013]. 

ZEUS, S. Chekanov et ai, Nucl. Phys. B700, 3 (2004) 
[hep-ex/0405065]. 

OPAL, G. Abbiendi et al, Eur. Phys. J. C37, 25 (2004) 
[hep-ex/0404026]. 

OPAL, G. Abbiendi et al, Eur. Phys. J. C31, 307 (2003) 
[hep-ex/0301013]. 

ALEPH, D. Buskulic et ai, Phys. Lett. B384, 353 
(1996). 

R. K. Ellis and S. Veseli, Phys. Rev. D60, 011501 (1999), 



[hep-ph/9810489]. 
[30] J. Campbell, R. K. Ellis and D. L. Rainwater, Phys. Rev. 

D68, 094021 (2003), [hep-ph/0308195]. 
[31] S. Frixione and B. R. Webber, JHEP 06, 029 (2002), 

[hep-ph/0204244]. 



[32] P. M. Nadolsky et ai, arXiv:0802.0007 [hep-ph]. 
[33] S. Allwood, Manchester PhD thesis, 2006. 
[34] E. Stefanidis, UCL PhD thesis, 2007. 



